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ABSTRACT  ^ 

The  leading  terms  of  the  bias  of  the  ratio  and  regression  estimators  are 
known  to  be  of  order  n"1  .  We  use  a  finite  population  decomposition  to  give 

a  different  expression  for  the  leading  term  of  the  bias*  Fitting  a  regression 
line  to  the  finite  population,  we  show  that  the  intercept  of  the  regression 
line  causes  the  bias  of  the  ratio  estimator.  Fitting  a  quadratic  regression 
to  the  finite  population,  we  show  that  the  bias  of  the  regression  estimator  is 
caused  by  the  quadratic  term.  We  also  give  a  compact  and  intuitive  formula 
for  the  leading  term  of  the  bias  of  the  weighted  regression  estimators  for  p- 
auxiliary  variables.  Using  the  same  decomposition,  we  can  rewrite  the 
variance  formula  of  some  popular  estimators  in  terms  of  some  simple  and 
interpretable  population  characteristics.  We  prove  that  under  simple  random 
sampling  scheme  the  unweighted  regression  estimator  is  the  most  efficient 
estimator.  The  extension  for  the  p-auxil iary  variates  is  also  given.  * 
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SIGNIFICANCE  AND  EXPLANATION 

In  survey  sampling/  we  often  make  use  of  the  auxiliary  covariate  to 
improve  the  precision  of  estimating  the  population  mean  of  a  character  of 
interest.  Ratio  and  regression  estimators  are  two  commonly  used  estimators. 

It  is  well-known  that  they  have  a  small  order  bias.  We  give  a  new 
interpretation  of  the  bias  using  a  finite  population  decomposition.  Fitting  a 
regression  line  to  the  finite  population/  we  show  that  the  intercept  of  the 
regression  line  causes  the  bias  of  the  ratio  estimator.  Fitting  a  quadratic 
regression  to  the  finite  population,  we  show  that  the  bias  of  the  regression 
estimator  is  caused  by  the  quadratic  term.  We  also  give  a  compact  and 
intuitive  formula  for  the  leading  term  of  the  bias  of  the  weighted  regression 
estimators  for  p-auxiliary  variables.  We  2d  so  prove  that  under  simple  random 
sampling  scheme  the  unweighted  regression  estimator  is  the  most  efficient 
estimator. 
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BIAS  AND  EFFICIENCY  OF  THE  CONSISTENT  WEIGHTED  REGRESSION  ESTIMATORS 

IN  FINITE  POPULATION  SAMPLING 


Lih-Yuan  Deng 


1.  Introduction 


Consider  a  finite  population  consisting  of  N  units  with  values  ( y*.  x;), 
i=l,2,...,N,  where  X;  is  positive  and  known.  A  simple  random  sample  of  size  n  is 
chosen  without  replacement  from  the  population.  Denote  the  sample  and  population 
means  of  y  and  x  by  y,  x  and  Y,  X  respectively.  The  ratio  estimator 

yR  =  Xy/x  and  the  linear  regression  estimator  yjj  =  y  -  b(  x  -  X)  are  the  most 
commonly  used  estimators  of  Y,  where  b  =  £f(  xi  -  *X  yi  -  y)/]£f(  x4  -  x)  is 
the  sample  regression  coefficient  of  y  over  x. 

The  ratio  estimator  is  known  to  have  a  bias  of  order  n-1  .  Durbin(1959), 
Beale(1962)  and  Tin(1965)  proposed  several  estimators  to  reduce  its  n-1  bias.  In 
Section  2,  we  use  a  finite  population  decomposition  to  give  a  different  expression  of 
the  leading  term  of  the  bias.  Fitting  a  regression  line  of  y  over  x  to  the  finite  popu¬ 
lation,  we  show  that  the  intercept  of  the  regression  line  causes  the  leading  term  of 
the  bias. 

Like  the  ratio  estimator,  the  linear  regression  estimator  is  also  a  biased  estima¬ 
tor.  To  study  the  bias  of  the  regression  estimator  we  introduce  a  different  decompo¬ 
sition.  By  fitting  a  quadratic  regression  to  the  finite  population,  we  show  that  the 
leading  term  of  the  bias  is  caused  by  the  quadratic  term.  The  sign  of  the  bias  is  also 
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determined  by  the  coefficient  of  the  quadratic  term.  In  fact,  we  show  that  a  negative 
(positive)  coefficient  indicates  a  slight  overestimation  (underestimation)  of 

Konijn  (1973)  gave  an  explicit  but  complicated  expression  of  the  leading  term 
of  the  bias  for  bivariate  regression  estimator.  In  Section  4,  we  give  a  compact  and 
more  intuitive  formula  for  the  leading  term  of  the  bias  of  the  p  dimensional 
weighted  regression  estimators. 

The  underlying  models  for  estimators  like  y  ,  y  R  ,  yj,  etc.  are  we  nown  in 
survey  sampling.  Better  understanding  of  the  model  for  which  each  es  nator  is 
used  would  help  samplers  to  choose  the  ’right’  estimator.  A  finite  population 
decomposition  will  be  introduced  to  study  the  effect  of  the  ’model  deviation’  on  the 

performance  of  the  estimators  like  mean-per-unit(  y),  ratio(  yR  )  and  regression(  yfr) 
in  Section  5.  Using  the  same  decomposition,  we  can  rewrite  the  variance  formulae 

A  A 

of  y,  yR  and  yfr  in  terms  of  some  simple  and  interpretable  population  characteris¬ 
tics.  The  efficiency  comparison  can  be  easily  made.  In  particular,  we  show  that  yfr 

is  more  efficient  than  yR  and  they  are  equally  efficient  if  the  intercept  of  the  popu¬ 
lation  regression  line  is  zero.  We  also  prove  that  under  simple  random  sampling 
scheme  the  unweighted  regression  estimator  is  the  most  efficient  estimator.  A 
natural  extension  to  p  variates  is  given  in  Section  6.  We  show  that  the  unweighted 
multivariate  regression  estimator  is  more  efficient  than  the  weighted  ones.  An  intui¬ 
tive  argument  is  also  given. 


2.  Bias  of  the  Ratio  Estimator 


In  general,  the  ratio  estimator  has  bias  and  variance  of  the  same  order  n-1. 
Hence,  in  practice  the  bias  usually  is  not  so  important  in  large  samples.  For  small 
sample  problems,  such  as  in  stratified  sampling  with  many  strata  where  the  separate 
ratio  estimator  is  used  in  each  stratum  with  small  sample  size,  the  problem  of  bias 
of  the  ratio  becomes  somewhat  important  Cochran(1977)  pointed  out  that  in  sur¬ 
veys  with  many  strata  and  small  samples  in  each  stratum  if  the  separate  ratio  esti¬ 
mator  seems  appropriate,  it  may  be  useful  to  modify  the  ratio  estimator  such  that  it 
is  unbiased  or  subject  to  a  smaller  order  bias  than  the  ratio. 

Hartley  and  Ross(1954)  proposed  an  unbiased  estimator 


where 


Yhr  =  r  X  +  (  y  ~  r  x)  , 


v_  A  rn, _ L 


7=  7  Sfri  =  f  •  (2.2) 

n  n  Xj 

Mickey(1959)  extended  Hartley  and  Ross ’s  ideas  to  get  another  unbiased  estimator 


-  k  vF  n(N-n-l 
Jm  -  R  x  +  -J-jj- 


•<y -  R30, 


where 


—  1  n  y  -  yj 

R  -  “  Ef  •  (2.4) 

n  n  x  -  Xj 

Lahiri(1951)  showed  that  the  ordinary  ratio  estimator  is  unbiased  under  an  unequal 
probability  sampling  scheme. 

There  are  several  methods  available  for  reducing  the  bias  to  order  n~2  .  The 
first  is  the  jackknife  estimator  of  yR  ,  due  to  Quenouille(1956).  Durbin(1959)  is  the 


first  one  to  propose  the  jackknife  method  for  ratio.  It  can  be  applied  to  a  broad 
class  of  statistical  problems  in  which  the  original  estimator  has  a  bias  of  order  n-1. 
Beale(1962)  proposed  the  following  estimator  for  bias  reduction. 


where 


-  1-f  *xy 

y  + - 

-  ^7  n  X  L 

yB  -  X - -  =  yR 

_  1-f  sx 

X  + - 

n  x 


i  1-f 
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i  1-f 

1  +  - cxx 
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(2.5) 


" _ *  Cju  — 

y  x  y 


x2  * 


(2.6) 


Tin(1965)  proposed  an  estimator  which  is  closely  related  to  Beale’s  estimator 


yT  =  yR  [1  (cxx  *  cxy)l  .  (2.7) 

Tin’s  correction,  i.e.  the  second  term  of  (2.7),  is  a  sample  analog  of  the  bias 
(Cochran,  1977,  p.161) 


E<  yR  -  Y)  =  -Li  (C„  •  C„>  Y  +  0(n-2) ,  0.8) 

where 


(2.9) 


Cochran(1977)  pointed  out  yB  and  yT  have  the  same  leading  term  of  order  n_1.  In 


general  ,  yB  and  yT  should  perform  very  similarly  for  large  sample. 


We  will  consider  a  decomposition  of  the  finite  population.  Using  this  decom¬ 
position,  we  caii  give  an  interpretation  of  the  bias  in  (2.8).  In  fact,  we  will  see  that 


the  leading  term  of  the  bias  of  yR  is  caused  by  the  non-zero  intercept  of  the 
regression  line  to  the  finite  population. 


Given  a  finite  population  {(  yit  Xi),i=l,..N},  we  can  decompose  the  population 


as  following 

y,-a  +  Pxi+  c, . 

(2.10) 

where 

„  ZiN(  *i  •  XX  y,  -  Y) 

P  "  SiN(  Xi  -  X)2  ’ 

(2.11) 

a  =  X  (R  -  p) . 

(2.12) 

It  is  easy  to  see  {  ej  satisfies 

zf1  «i  •  o .  zr«i*i-o. 

(2.13) 

Using  (2.10),  we  can  find  the  leading  term  of  the  bias  of  yR  in  terms  of  the  inter¬ 
cept,  a 


E(Fr  *  Y)  =  —  aC„+  0(n-2) ,  (2.14) 

n 

which  follows  from  (2.10),  (2.13),  and  Cxy  =  -_XL  *  Cxx  . 

A 

Formula  (2.14)  shows  that  the  bias  of  yR  is  caused  by  the  non-zero  intercept 
a  in  the  decomposition.  Furthermore,  the  leading  term  of  the  bias  depends  on 
a  C**.  Since  C**  >  0,  the  sign  of  the  bias  is  the  same  as  the  sign  of  a.  That  is,  if 

a  >  0  then  we  would  expect  that  yR  will  slightly  overestimate  Y  ;  if  a  <  0  then 

yR  will  underestimate  Y.  There  is  no  certainty  that  for  small  n  the  actual  bias  is 
reduced.  However,  we  have  reason  to  expect  that  the  bias  will  be  diminished  when 
n  is  not  too  small  and  the  population  is  not  extremely  irregular. 
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3.  Bias  of  the  Regression  Estimator 


Like  the  ratio  estimator,  the  linear  regression  estimator,  y^  is  also  a  biased 
estimator.  The  leading  term  of  the  bias  is  given  as  follows  (Cochran  1977,  p.198) 

*  —  1-f  E(  e;(  Xj  -  X)2) 

E(  y„  -  Y)  =  -  - +  0(n-2) ,  (3.1) 

where  ej  is  defined  in  (2.10) 


To  provide  an  interpretation  for  the  leading  term  of  the  bias  of  yb,  we  use  the 
following  quadratic  decomposition  of  the  finite  population 

yi  =  Po  +  Pi  ^  +  P2  Xj2  +  di ,  (3.2) 

where  {  Pj ,  j=l,2,3}  minimizes 


That  is 


where 


£,"<*-  Po-  IW  Mi ¥. 


(  Po.  Pi.  hY  -  (X'Xr'X'y  , 


1  *1  *1 


x=  • 


.  y  = 


I1  XN  XN2 


From  the  least  squares  theory,  it  is  easy  to  see 


ZiN  4  -  0  ,£,n  dj  *,  ,  0  d,  x,2  «  0 


Using  the  decomposition  (3.2),  we  can  show 
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Theorem  3.1.  Let  (J0,  {3^  {Sj  as  in  (3.3)  and  X*1'  *  —  ^  x/  ,  the  t-th  population 
moments  of  x,  x^  be  the  t-th  sample  moments,  then 


E<yt-  Y)=  -  ^  fc-fr  +  CXn-2). 
n  si 


where 


K  =  [  X*4)  -  (  X(2\  X<3)) 


fl  X  l"1  f  3E»1 

XX®J  lx<3>J]‘ 


I  i  y  vW  | 

|  x  X^v*3) 

!  i  x  | 

XX® 


Proof.  We  can  rewrite  (2.10)  in  the  matrix  form 


y  =  X  H  a  +  e  , 


where 


H  =  0  1  ,  a=  fe]  , 

looj  ~ 

e  =  (  elf  62,..,  eN)',  X  and  y  defined  in  (3.4).  Using  (2.13),  (3.3)  and  (3.6), 


_L  yN  e. 

B  N  Ll  *» 

Po  a  ,  -l  , 

Pl  -  P  +  <TT  X'X)  ~  If*  «i  *i 
B,  loJ  N  N 

J  —  VNex2 

N  il  'lX‘  , 


c  _L  yN  2 
N  "  11 


Therefore 


02  98  c  H(  ej  X;2) ,  (3.8) 

where  the  explicit  expression  of  c  can  be  found  using  the  formula  for  the  matrix 


inversion 


%  l^'\i  ."V 

N.W 

•  V 

v 


.  ^a,  *r  v  m*. i-Tw* .  -  \ 

*y<< >V- 

V  -  ; 


.  «T-  s'.  - 


-V-V  A 


-1 


c  = 


1  X 

xx® 


1  X  X®! 
X  5£<2)  X*3* 
X®X<3>X<4) 


=  [  5?4)  -  (  X<2),  X<3)) 


1  X 

xx<2> 


vt2) 

x<3> 


J-1  =  K-1  . 


(3.9) 


Note  in  writing  the  last  equality,  we  used  the  formula  for  the  determinant  of  a  parti¬ 


tioned  matrix  (  e.g.  Rao,  1971,  p.  32  ).  Using  (3.9)  and  the  fact  that  [■—  (X'X)] 

N 


is  positive  definite  ,  we  have 


From  (2.13),  we  have 


K  =  c-1  >  0  . 


(3.10) 


E(  ei  Xi2)  =  E(  ei(  Xi  -  X)2).  (3.11) 

Theorem  3.1  follows  from  (3.1), (3. 8), (3. 10)  and  (3.11).  □ 


Theorem  3.1  may  provide  a  better  understanding  of  the  bias  of  y^  The  lead¬ 
ing  term  of  the  bias  is  due  to  the  non-zero  P2  *he  coefficient  of  the  quadratic  term 

in  the  decomposition  (3.2).  Furthermore,  we  can  see  the  over  or  underbias  of 

depends  only  on  the  sign  of  p2.  ^  P2  >  0  ,  then  we  expect  y^  will  underestimate 

Y,  whereas  P2  <  0  indicates  an  overestimation  of  yj,. 


4.  Bias  of  the  Multiple  Regression  Estimator 

The  discussion  so  far  has  been  restricted  to  the  situation  in  which  auxiliary 
information  on  just  one  x-variate  is  to  be  used  for  improving  the  precision  of  esti¬ 
mates.  In  practice,  we  may  have  information  about  several  x-variates  and  it  may  be 
considered  important  to  make  use  of  all  the  available  information  to  get  a  more  pre¬ 
cise  estimator.  Several  methods  of  using  p-variates  Xj,  X2,  •  •  •  ,Xp  are  proposed  in 
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the  literature.  The  most  popular  estimator  is  the  multivariate  regression  estimator 


ymir=  y '  £  Pj(xj'  XP  *  (4.1) 

j=i 

where  {  Pj  j-l,..,p)  is  the  least  squares  estimate  of  the  corresponding  population 

parameter  {  Pj,j=l..,p  }  in  the  linear  regression  model,  y^  is  the  best  linear 
unbiased  estimator  under  the  following  superpopulation  model(Royall,  1970) 


where 


yj=  Po+£pj*ji+  ei* 

j=i 


(4.2) 


EM(ei)  =  0  ;  Em(  ^  £j)  = 


O^Wj 

lO 


if  i=j 

if  i  5*  j 


with  Wj  =  1.  Clearly,  the  multivariate  regression  is  a  consistent  estimator  of  Y. 


Like  the  ratio  estimator  and  simple  regression  estimator,  ymlr  has  also  a  bias  of 
order  n-1  .  For  p=2,  Konijn(1973)  gave  an  explicit  but  complicated  expression  of 
the  leading  term  of  the  bias  for  the  bivariate  regression  estimator,  i.e. 


where 


1-f  1  2  Sei2  P  Sell  Se22 

"  l-pJ  1  SX,SX!  '  Sx,2  '  SXlJ  '• 


(4.3) 


Sei2  =  ZlN  ei(  xli  "  Xl)(  x2i  '  x2)  * 


e;  =  (  yi  -  Y)  -  Bj(  xH  -  X,)  -  B2(  x2i  -  X2)  , 

Bj  and  B2  are  the  population  regression  coefficient  of  y  over  X1?  X2,  and  p  is  the 

correlation  coefficient  between  Xj  and  X2  ,  and  S  X|,S  x2  ^  ^e  population  vari¬ 
ances  of  Xj,  X2  respectively. 
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Our  purpose  is  to  find  a  simple  formula  for  the  leading  term  of  the  bias  of 


and  yw  which  is  in  a  more  general  class  of  weighted  multivariate  regression 
estimators 

Far  =<Xo,X„X2 . Xp)<  X,'  W,-1  X,)'1  Xt'W,-'y, 

-  W  X.'  w."‘  X,)"'  X,'  W,->  y,  , 

where 


X  = 

Xoi 

> 

« 

*11  .  xpl 

,  X,  = 

* 

*01 

» 

* 

X11 .  xpl 

*0N 

k 

X1N  •  xpN 

« 

XOn 

w 

xln  •  xpn 

xi  =  ZiN  xji  .  y»  =  ( yi.  y*-.  yn)' .  *n  =  (U,  •  •  •  ,1/ 

and 

W,  =  diag(wj,w2 . wD) 

is  a  sub-matrix  of 

W  =  diagCwj^, .  .  .  ,wN)  . 

From  the  Theorem  1  of  Wright(1983),  we  know  that  yw  is  a  consistent  esti¬ 
mator  of  Y  if 

1^  e  col(  W-1X)  ,  i.e,  W  =  X  c  ,  for  some  c  .  (4  4) 

*  A 

We  will  consider  the  class  of  estimators  yw  satisfying  this  condition.  If  yw  con¬ 
tains  the  intercept  term,  then  =  1,  otherwise  Xq  should  be  omitted  from  yw  .  It 
is  easy  to  see  that  yR  ,  y^  and  y,^  are  special  cases  of  yw  . 


Using  the  weighted  least  square  notation,  we  have 


Fw  =  (  Xo,  Xlt  X2,  •  •  • ,  XJ  P  -  X  p  , 

~w  -  ~w 

where 


P  -  (  X,'  W -»  X,)-1  X,'  W,->  y,  .  (4.5) 

~w 

The  following  decomposition  of  the  finite  population  will  be  used  to  prove  our  key 
result 


y  -  X  p  +  e, 

~W 

where 

P  =  (XV^"1  X'W“1y  . 

~w 

It  is  easy  to  see  that 

X'  W^e  =  0  . 

Lemma  4.1.  Let  P  ,  P  be  defined  as  in  (4.5), (4.7),  then 

~W  -w 


(4.6) 


(4.7) 


(4.8) 


P  -  p  =(Xg'W-1Xsr1  X/  Wg-1  eg  =  OM*s) 

~w  ~w 

=  Sxx.w_1  »  +  O^n-1) ,  (4.9) 

where 

SXx,w  =  —  x'  W'!X  ,  u  =  (  u,,  u2 . up)'  (4.10) 

and 

»j  =  7  If  uji  >  uji  =  xjiwi_1  ej .  (4.11) 

Proof.  Since 


- 11  - 


f.y 


(4.12) 


P  -  P  +  ( X,' W,-i  X,)-'  X/W.-'e,. 


-w  “W 


it  is  easy  to  see 


-  X/  W,-1  X,  =  ±  XW'X  +  Op(n-0-5)  =  Sxxw  +  Op(n-°-5) .  (4.13) 

n  N 


And  from  (4.8),  we  have 


~  X/  W -1  e,  =  (  ulf  u2,  • ,  Up)'  =  Op(n-°-5) 


(4.14) 


Hence,  using  (4.13)  and  (4.14) 


P  -  P  =SXXtW-1u+  Op(n-1) . 


This  completes  the  proof  of  Lemma  4.1.  □ 


Theorem  4.1.  If  1N  e  col(  W-1X),  then 


E(  Yw  -  Y)  =  -  trCS^-1  SXUtW)  +  0(n'2) , 
where  tr(A)  denotes  the  trace  of  the  matrix  A  and  Sxxw  is  defined  in  (4.10), 


-  M 


NlT^iNuii(xki-  Xk)  =  jjlj-  ZiN  xji  eiwi‘l(  *ki  •  Xk). 
Proof.  Let  x  =  ln'  Xs  /  n  and  X  =  lj/X  /  N.  Since  1N  e  col(  W_1X), 


we  have 


y  =  x  P  ,  which  together  with  Lemma  4.1  implies 

~  ~W 


yw  =  [  x  -  (  x  -  X)]  p  =  y  -  (  x  -  X)[  p  +  w  1  u  +  O  (n  *)] 

~  ~W  ~  ~  ~W  -  r 

=  y  -  (  x  -  X)  p  -  (  x  -  X)Sxx  W~1  u  +  Op(n-1-5) . 

Hence,  we  have 


M  **  0  M  *»»  *  -V  *  H  * 


✓  4* 

-•V  >' 


E(  y w  -  Y)  =  -  E((  x  -  XyS^-1  u)  +  0(n-2) 


E((x-  XJS^"1  u)  =  E(tr[(x-  XyS^1  u])  =  E(tr[Sxx>w_1  u(  x  -  X)]) 


=  fr(Sxx,w"lEt »( x  -  X)])  =  tKS^-1  sxuw) . 

In  writing  the  above  equalities,  we  use  the  fact  Tr(AB)=Tr(BA)  and 

E(u(x-  X))=  (E[uj(xk-  Xk)]]  =  [sUj>XkJ. 
This  completes  the  proof  of  Theorem  4.1.  □ 

Three  special  cases  of  Theorem  4.1  are  mentioned  below. 


(1)  For  the  ratio  estimator,  X  =  (  xlt  x2,..,  xN  )' ,  W  =  diag(  xh  x2,..,  xN  ) , 
Pw  =  Y/  X  =  R  and  ej  =  ys  -  pw  Xj  =  yj  -  R  x; .  In  terms  of  the  decomposition 
(2.10), 

SXu,w  =  zr  ei(  xi "  X)  =  S2(P  -  R)  =  -  a  — =-  . 

A 

Therefore,  the  leading  term  of  the  bias  of  yR  is 


n  ^  (  Sxx,W  ^XU.W  ) 

which  is  the  same  as  (2.14). 

(2)  For  the  regression  estimator,  we  have 

fl  xi  1 


1-f  sx 

-  a  -=r 

n  y2 


x-  ;  ,w  =  iNxN 

1  xN 


It  is  easy  to  see 


•0B 


[x  5?2>J  [0  ECcjXi2^^ 


Hcncc,  the  leading  term  of  the  bias  of  yk  is 


n  ^  (  ^xx,w  ^xu.w  ) 


1-f  E(  ej  Xj2) 
n  S2 


(3)  It  can  be  verified  that  Konijn’s(1973)  expression  of  the  bias  for  the  special  case 
with  p=2  is  the  same  as  our  formula  in  Theorem  4.1.  Our  formula  is  much  more 
general  than  Komjn’s  even  for  p=2.  Note  that  for  the  multivariate  regression  esti¬ 
mator,  W  =  I  and  1N  in  the  design  matrix  X,  and 


Sr**-  xk> = o , 


which  implies 


N_1  2d  Xi>  C'^  ‘  Xk)  “  ZlN(  xji  -  Xj)  Cj(  xw  -  Xk) . 

This  is  the  same  as  the  Sel2's  in  Konijn’s  formula  in  (4.3). 

5.  Efficiency  Comparison  of  y  ,  yR  and 
The  variance  of  y  is  well-known, 

V”<  j)  -  -V  S>  -  Nir  2lN<  »  -  Y)2  .  (5.1) 

There  are  no  closed  forms  for  Var(  yR  )  and  Var(  yjj.).  Each  can  be  approximated  by 
their  approximate  variances  (Cochran,  1977) 

v  1-f  1  _N.  Y  2 

Vr=Tih  Z' ( Vi ‘  “f  Xi>  (5-2) 


*•. 

.  •  .  v.  •  .  'j. 


1 


where  f  =  n/N  is  the  sampling  fraction  and  B  =  P  is  given  in  (2.11). 

We  would  like  to  rewrite  the  expression  of  VR,  and  Var(  y)  in  terms  of  the 
decomposition  in  (2.10).  Using  the  decomposition  (2.10),  we  have 


Theorem  5.1.  Let  a,P  be  defined  as  in  (2.10),  then 


(»)Fr  -  Y  =  (1  -(■—))(-  a)  +  (^-)e,  where  *  = 

x  x  n 


(b)Vn  =  (  a2-=T  +  Si) ,  where  Se2  -  £,N  «i2  • 


Proof.  From  (2.10)  and  (2.13),  we  have 


Yr  =a(4)  +  PX+  X  and  Y  =  a  +  pX, 
x  x 

which  implies  Part(a).  From  (5.2),  we  have 


Vr 


=  —  — —  Y.N  d2  ,  <L  =  -  a  x 
n  N-l  %  ^  X 

Using  (2.13),  it  is  easy  to  see 


X) 


+  C; 


(5.4) 


2)lN  ei(  xi  *  X)  =  0. 

Part(b)  follows  immediately  from  (5.4)-(5.5).  □ 


(5.5) 


Part(a)  of  Theorem  5.1  shows  that  yR  -  Y  depends  only  on  the  intercept,  a 

A  _ 

,  not  on  the  slope  p.  yR  -  Y  can  be  written  as  a  "weighted"  average  of  -  a  and 
e  ,  the  weight  for  e  is  XJx  .  (Note  that  the  weight  X/  x  may  be  greater  than  1 
).  For  a  sample  with  x  =  X,  the  weight  associated  with  e"  is  1,  and  zero  weight 

for  -  a.  Hence,  there  is  no  effect  of  a  on  the  difference  yR  -  Y  for  any  sample 


with  x  e  X.  Such  a  sample  is  called  "balanced  sample"  in  Royall  and  Her* 
son(1973). 

From  Part(b)  of  Theorem  5.1,  we  can  see  that  VR,  the  leading  term  of 

Var(  yR  ),  is  composed  of  two  sources  of  variation.  The  first  component  is  the  non¬ 
zero  intercept  (a)  of  the  decomposition  and  the  population  characteristic  (  S2/  X2) 
of  x-variate.  The  second  is  the  population  variance  (  S2)  of  e-variate.  For  y,  the 
mean-per-unit  estimator,  we  have 


Theorem  5.2. 

(a)  y  -  Y  =  p(x-  X)+  e. 

(b) Var(  y)  =  —  (  p2  S 2  +  S 2) . 

n 

(c) VR  £  Var(  y)  if  and  only  if  |  a  |  1  |  P  |  X  , 
where  a  and  P  are  defined  in  (2.11 ), (2. 12). 

Proof.  Part(a)  is  trivial.  Using  (2.10)  and  (2.13),  we  have 

yt  -  Y  =  P(  xj  -  X)  +  ej .  (5.6) 

Part(b)  follows  from  (5.1),  (5.5)  and  (5.6).  Part(c)  follows  from  Part(b)  and 

Theorem  5.1.  □ 

For  the  regression  estimator,  y^  we  have 
Theorem  5.3. 


(a)  yir  -  Y  =  e  +  Op(n_1) 


Ijttt] 


O 


(b) Vlr  =  ■—  Se2 

(c) Vlr  £  VR>  with  strict  inequality  if  a  #  0 
(dJVjj  $  Var(  y),  with  strict  inequality  if  f)  #  0 

Proof.  Parts(a)  and  (b)  arc  trivial.  Part(c)  follows  from  Theorem  5.1.  Part(d)  fol¬ 
lows  from  Theorem  5.2.  □ 

From  Part(b)  of  Theorem  5.3,  we  see  that  does  not  depend  on  a  and  (5, 
whereas  VR  depends  on  the  intercept  a  and  Var(  y)  depends  on  the  slope  (5.  The 

reason  is  that  the  underlying  model  for  yjj  has  the  same  structure  as  the  decomposi¬ 
tion  (2.10).  On  the  other  hand,  the  underlying  model  for  yR  is 

yi  =  P  xi  +  Ej, 

A 

hence  yR  captures  the  slope  term  in  the  decomposition  but  not  the  intercept  term. 
The  underlying  model  for  y  is 

yj  =  a+  Ej , 

hence  the  intercept  in  (2.10)  can  be  captured  by  y  but  not  the  slope. 

*  A 

Part(c)  of  Theorem  5.3  shows  that  yfr  is  always  more  efficient  than  yR  .  Note 
the  result  of  Part(c)  is  well-known(  Cochran,  1977,  p.196)  without  using  the  finite 
population  decomposition  approach.  For  estimating  cell  totals  in  tables  of  the  type 
typically  constructed  from  survey  data,  Fuller(1977)  showed  the  superior  perfor¬ 
mance  of  the  regression  estimator.  However,  in  practice  yR  is  more  popular  than 

*  A  A 

yir  One  reason  is  the  computational  simplicity  of  yR  over  y^  in  complex  situa- 


S3 


y-y-v- 


Wright(1983)  characterized  a  class  of  consistent  estimators  for  a  general  sam¬ 
pling  plan.  Applying  his  result  to  the  simple  random  sampling,  we  can  see 

y w  =  +  Pw  X  (57) 

is  a  consistent  estimator  of  Y  if  w,  is  chosen  to  be  either  l’s  ,  Xj's  or  Cj  +  C2  xit 
where 

(<iw.  W'  -  (  X.'  W,-‘  X,)'1  X.'  W ->  y,  , 

*  « 

1  *1 

x,  =  ■  ,  y*  =  (  yi,  y2>-,  yD)'  •  WI  *  diag(w!,w2,  •  •  •  ,wn) . 

i 

k  * 

However,  it  is  not  clear  that  how  to  choose  the  "optimal"  weight.  One  criterion  we 
may  use  is  the  minimum  variance  of  yw  .  According  to  which,  we  can  choose  the 

A 

"best"  weight  among  this  class  of  consistent  estimators  of  Y.  Note  that  yj,  is  also 

a  special  case  of  yw  with  w{  =  1.  Theorem  5.4  shows  that  yk  (w;  =  1  )  is  the  best 
choice. 

Theorem  5.4.  Let  Vw  denote  the  leading  term  of  Var(  yw  )  and  a,  p  be  defined 
as  in  (2.10).  If  Wj  =  Cj  +  C2  Xj  for  some  Cj,  c2,  then 

(*)Vw  -  2,N(  yi  -  Cl.  -  Xi)2 . 

(b)  For  any  a©,  p0, 

IiN(  yi  -  oto  -  p0  *i)2  =  IiN(  yj  -  «  -  P  Xi)2  +  ZiNf(  txo  -  a  )  +  (  p0  -  p  x*)]2 

(c) Vh  <,  Vw  for  any  choice  of  w4 .  where  (  a*,  pw)'  =  (X'W"^)-1  X'W'V  , 


X-  ■  ,  y  =  ( yj.  y*-,  yN)' .  w  -  diag^.wj,  •  •  •  ,wN) . 

i  xN 

Proof.  Since  Theorem  5.4  will  be  a  special  case  of  Theorem  6.1  in  Section  6,  the 
proof  is  omitted. 

From  Part(c),  we  see  that  y^  (with  Wj  =  1)  is  the  most  efficient  estimator 

among  the  class  of  consistent  estimators  yw  .  This  provides  a  strong  justification 

for  the  use  of  y^  in  survey  sampling.  Obviously,  we  would  expect  Theorem  5.4 
also  holds  for  multiple  regression  estimator  when  there  are  more  than  one  auxiliary 
variables.  We  will  extend  Theorem  5.4  to  the  multivariate  case  in  Section  6. 

6.  Optimal  Weighted  Multiple  Regression  Estimator 

In  this  section  we  consider  the  efficiency  of  the  estimators  based  on  the  p- 

auxiliary  variables.  Consider  the  multivariate  regression  estimator,  y^,  and  yw  , 
which  are  defined  Section  4. 

There  is  no  exact  formula  for  Var(  yw  )  and  MSE(  yw  ).  It  can  be  easily  shown 

that  the  leading  terms  of  Var(  yw  )  and  MSE(  yw  )  are  the  same.  (Note  that  it  is  true 

only  for  the  case  1N  6  col(  W-1X)  ).  Let  Vw  denote  the  leading  term  of  Var(  yw  ). 
Lemma  6.1  finds  an  expression  for  Vw  . 


Lemma  6.1.  If  1N  e  col(  W-1X),  then 


and 


1  y*n 
N-l  Ll 


where 


cj  =  yj  -  x  p 


-i  -w 


P  =  (X'W~1X)"1  X'W-V- 

~w 

Proof.  Let  X  =  l^X  /  N  and  x  «  ln'  X,  /  n  .  Since  1N  e  col(  W'!X) 


have 


yw  -  Y  =  (  y  -  Y)  -  (  x  -  X)  P  +  Op(n-1) . 
From  (4.4),  it  is  easy  to  see 

xp  =  X(X'W-1X)-1  X'W'y  =  Y. 

~  ~w  ~ 

Hence 


yw  -  Y  =  y  -  x  p^  +  Op(n  !)  =  e  +  Op(n  !) , 

where 

e  =  —  52“  ei  *  Cj  =  yi  -  X  P  . 
n  -i  ~w 

Note  that 


■S-zr'i'TT  iN'(y-xp  )=  Y-  xp  =0. 

IN  IN  ~w  ~  ~w 

Therefore 


and 


e=  Op(n-°-5) 


E[(  yw  -  Y)2]  =  —  S  2  +  0(n-2) . 
n 

This  completes  the  proof  of  Lemma  6.1.  □ 


Using  Lemma  6.1,  we  can  easily  prove  the  key  result 


Theorem  6.1.  Let  VjpVw  denote  the  leading  terms  of  Var(  ymir),Var(  yw )  and 


B  =  (X'X)  !X'y  For  any  P  ,  we  have 

-o 

(a) (y  -  X  P  )'(y  -  X  p  )  =  (y  -  X  P)'(y  -  X  p)  +  (  p  -  PJ'X'X*  P  -  p)  . 

-o  -o  -  -o  ~  -o 

(b) Vlr  <  Vw  .  i.e.  y,,*  (  with  Wj  =  1  )  is  the  most  efficient  estimator  among  yw  . 


Proof.  Define 

e  =  y  -  X  pQ  =  (y  -  X  g)  +  X(  p  -  pQ)  =  d  +  X(  p  -  pQ)  .  (6.8) 

Since  d'X  =  (y  -  X  p  =  y'(I  -  XCXOCr^OX  =  0  ,  we  have 


e'e  =  d'd  +  (  P  -  p  )TX(  p  -  p  ) . 

“0  -o 


(6.9) 


This  is  Part(a).  Part(b)  follows  easily  from  Part(a)  (with  P  = 

-o 


P  ),  Lemma  6.1, 

~w 


v»- 


A  more  intuitive  proof  of  Theorem  6.1  is  given  below:  From  Lemma  6.1,  we 


have 


Var(  yw  )  =  Vw  =  it  -L  £iN<  y,  -  X  p  )2,  (6.10) 

n  in— i  ~  i  ~w 

which  is  minimized  by  taking  P  to  be  the  unweighted  least  squares  estimate  since 

(6.10)  is  an  unweighted  sum  of  squares.  For  unequal  sampling  scheme,  (6.10) 
might  be  a  weighted  least  squares.  In  that  case  w*  =  1  is  no  longer  the  optimal 
choice.  We  have  shown  that  the  most  efficient  estimator  can  be  obtained  by  choos¬ 
ing  W-L  However,  it  is  interesting  to  see  the  comparison  of  yw  under  different 
criteria,  for  example,  the  coverage  probabilities  of  the  associated  t-intervals  and  so 
on. 

-21  - 
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